A variable-length quantity (VLQ) is a universal code that uses an arbitrary number of binary octets (eight-bit bytes) to represent an infinitely large integer. It was defined for use in the standard MIDI file format[1] to save additional space for a resource constrained system, and is also used in the later Extensible Music Format (XMF). A VLQ is essentially a base-128 representation of an unsigned integer with the addition of the eighth bit to mark continuation of bytes. See the example below.
Base-128 is also used in ASN.1 BER encoding to encode tag numbers, and it is used in the WAP environment, where it is called variable length unsigned integer or uintvar. The DWARF debugging format[2] defines a variant called LEB128 (or ULEB128 for unsigned numbers), where the least significant group of 7 bits are encoded in the first byte and the most significant bits are in the last byte. Google's protocol buffers use a similar format to have compact representation of integer values[3].
Contents |
The encoding assumes an octet (an eight-bit byte) where the most significant bit (MSB), also commonly known as the sign bit, is reserved to indicate whether another VLQ octet follows.
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|
27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
A | Bn |
If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.
B is a 7-bit number [0x00, 0x7F] and n is the position of the VLQ octet where B0 is the least significant. The VLQ octets are arranged most significant first in a stream.
In the data format for Unreal Packages used by the Unreal Engine, a variable length quantity scheme called Compact Indices[4] is used. The only difference in this encoding is that the first VLQ has the sixth binary digit reserved to indicate whether the encoded integer is positive or negative.
7 | 6 | 5 | 4 | 3 | 2 | 1 | 0 |
---|---|---|---|---|---|---|---|
27 | 26 | 25 | 24 | 23 | 22 | 21 | 20 |
A | B | Cn |
If A is 0, then this is the last VLQ octet of the integer. If A is 1, then another VLQ octet follows.
If B is 0, then the VLQ represents a positive integer. If B is 1, then the VLQ represents a negative number.
C is a 6-bit number [0x00, 0x3F] and n is the position of the VLQ octet where C0 is the least significant. The VLQ octets are arranged most significant first in a stream.
Any consecutive VLQ octet follows the general structure.
Here we work an example with the number 137:
The Standard MIDI File format specification gives more examples:
Integer | Variable-length quantity |
---|---|
0x00000000 | 0x00 |
0x0000007F | 0x7F |
0x00000080 | 0x81 0x00 |
0x00002000 | 0xC0 0x00 |
0x00003FFF | 0xFF 0x7F |
0x00004000 | 0x81 0x80 0x00 |
0x001FFFFF | 0xFF 0xFF 0x7F |
0x00200000 | 0x81 0x80 0x80 0x00 |
0x08000000 | 0xC0 0x80 0x80 0x00 |
0x0FFFFFFF | 0xFF 0xFF 0xFF 0x7F |